Manipulation of FASTQ data with Galaxy

نویسندگان

  • Daniel Blankenberg
  • Assaf Gordon
  • Gregory Von Kuster
  • Nathan Coraor
  • James Taylor
  • Anton Nekrutenko
چکیده

SUMMARY Here, we describe a tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps. AVAILABILITY AND IMPLEMENTATION This open-source toolset was implemented in Python and has been integrated into the online data analysis platform Galaxy (public web access: http://usegalaxy.org; download: http://getgalaxy.org). Two short movies that highlight the functionality of tools described in this manuscript as well as results from testing components of this tool suite against a set of previously published files are available at http://usegalaxy.org/u/dan/p/fastq

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

fqtools: an efficient software suite for modern FASTQ file manipulation

UNLABELLED Many Next Generation Sequencing analyses involve the basic manipulation of input sequence data before downstream processing (e.g. searching for specific sequences, format conversion or basic file statistics). The rapidly increasing data volumes involved in NGS make any dataset manipulation a time-consuming and error-prone process. I have developed fqtools; a fast and reliable FASTQ f...

متن کامل

Bioclojure: a functional library for the manipulation of biological sequences

MOTIVATION BioClojure is an open-source library for the manipulation of biological sequence data written in the language Clojure. BioClojure aims to provide a functional framework for the processing of biological sequence data that provides simple mechanisms for concurrency and lazy evaluation of large datasets. RESULTS BioClojure provides parsers and accessors for a range of biological seque...

متن کامل

اندازه‌گیری نمایه عمق نوری خوشه‌های کهکشانی با استفاده از اثرسونیائف زلدوویچ جنبشی

baryonic matter distribution in the large-scale structures is one of the main questions in cosmology. This distribution can provide valuable information regarding  the processes of galaxy formation and evolution. On the other hand, the missing baryon problem is still under debate. One of the most important cosmological structures for studying the rate and  the distribution of the baryons is gal...

متن کامل

BEETL-fastq: a searchable compressed archive for DNA reads

MOTIVATION FASTQ is a standard file format for DNA sequencing data, which stores both nucleotides and quality scores. A typical sequencing study can easily generate hundreds of gigabytes of FASTQ files, while public archives such as ENA and NCBI and large international collaborations such as the Cancer Genome Atlas can accumulate many terabytes of data in this format. Compression tools such as ...

متن کامل

LFQC: a lossless compression algorithm for FASTQ files

MOTIVATION Next Generation Sequencing (NGS) technologies have revolutionized genomic research by reducing the cost of whole genome sequencing. One of the biggest challenges posed by modern sequencing technology is economic storage of NGS data. Storing raw data is infeasible because of its enormous size and high redundancy. In this article, we address the problem of storage and transmission of l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2010